-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-42049][SQL][FOLLOWUP] Always filter away invalid ordering/partitioning #40137
Conversation
cc @peter-toth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @sunchao , @huaxingao , @viirya , too.
@@ -114,7 +114,11 @@ trait AliasAwareQueryOutputOrdering[T <: QueryPlan[T]] | |||
} | |||
}.takeWhile(_.isDefined).flatten.toSeq | |||
} else { | |||
orderingExpressions | |||
// Make sure the returned ordering are valid (only reference output attributes of the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Thanks for this follow-up!
But, I'm not sure this part is correct as sortOrder.children
should be filtered with _.references.subsetOf(outputSet)
separately.
E.g. this test (is a bit artifical though) fails now:
test("SPARK-42049: Improve AliasAwareOutputExpression - no alias but still prune expressions 2") {
withSQLConf(SQLConf.OPTIMIZER_EXCLUDED_RULES.key ->
Seq(CollapseProject.ruleName, ColumnPruning.ruleName).mkString(",")) {
val df = spark.range(2).select($"id" as "a", $"id" as "b").select($"a")
val outputOrdering = df.queryExecution.optimizedPlan.outputOrdering
assert(outputOrdering.size == 1)
assert(outputOrdering.head.child.asInstanceOf[Attribute].name == "a")
assert(outputOrdering.head.sameOrderExpressions.size == 0)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point!
Hmm, the failed test seems a related one. |
Oh is it still failing? |
I think it was failing due to latest commit. |
The failed test checks invalid ordering and I've updated it. |
…itioning ### What changes were proposed in this pull request? This is a follow-up of #37525 . When the project list has aliases, we go to the `projectExpression` branch which filters away invalid partitioning/ordering that reference non-existing attributes in the current plan node. However, this filtering is missing when the project list has no alias, where we directly return the child partitioning/ordering. This PR fixes it. ### Why are the changes needed? to make sure we always return valid output partitioning/ordering. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests Closes #40137 from cloud-fan/alias. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 72922ad) Signed-off-by: Dongjoon Hyun <[email protected]>
Merged to master/3.4. |
…itioning ### What changes were proposed in this pull request? This is a follow-up of apache#37525 . When the project list has aliases, we go to the `projectExpression` branch which filters away invalid partitioning/ordering that reference non-existing attributes in the current plan node. However, this filtering is missing when the project list has no alias, where we directly return the child partitioning/ordering. This PR fixes it. ### Why are the changes needed? to make sure we always return valid output partitioning/ordering. ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? new tests Closes apache#40137 from cloud-fan/alias. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit 72922ad) Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
This is a follow-up of #37525 . When the project list has aliases, we go to the
projectExpression
branch which filters away invalid partitioning/ordering that reference non-existing attributes in the current plan node. However, this filtering is missing when the project list has no alias, where we directly return the child partitioning/ordering.This PR fixes it.
Why are the changes needed?
to make sure we always return valid output partitioning/ordering.
Does this PR introduce any user-facing change?
no
How was this patch tested?
new tests